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The epigenome of the human nnalaria vector Anopheles gambiae was characterized in 
nnidgut cells by mapping the distribution and levels of two post-translational histone 
modifications, H3K27ac and H3K27me3. These histone profiles were then correlated 
with levels of gene expression obtained by RNA-seq. Analysis of the transcriptome of 
A. gambiae midguts and salivary glands led to the discovery of 13,898 new transcripts 
not present in the most recent genome assembly. A subset of these transcripts is 
differentially expressed between midgut and salivary glands. The enrichment profiles of 
H3K27ac and H3K27me3 are mutually exclusive and associate with high and low levels of 
transcription, respectively. This distribution agrees with previous findings in Drosophila 
showing association of these two histone modifications with either active or inactive 
transcriptional states, including Polycomb-associated domains in silenced genes. This 
study provides a mosquito epigenomics platform for future comparative studies in other 
mosquito species, opening future investigations into the role of epigenetic processes in 
vector-borne systems of medical and economic importance. 
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INTRODUCTION 

Host-parasite interactions are among the most plastic and 
dynamic systems in nature. In these systems, epigenetic modifica- 
tions can provide an accessory source of fast-acting, reversible and 
readily available phenotypic variation that can be directly shaped 
by both host and parasite selection pressures (Bonduriansky and 
Day, 2009; Gomez-Diaz et al., 2012). Mosquitoes are important 
disease vectors v^orldwide. Anopheles gambiae is major vector of 
malaria in Africa, a disease that affects more than 300 million 
people and causes around 650,000 deaths each year (Who, 2013). 
The genome of A. gambiae was sequenced more than 10 years ago 
(Holt et al, 2002). Since then, the roadmap for malaria control 
strategies has been mostly centered on the study of vector genetic 
variation (Severson and Behura, 2012). Yet, genetic mechanisms 
alone are not sufficient to explain natural phenotypic variation in 
terms of vector competence (Lambrechts, 2010). Thus, it is crit- 
ical to expand our current understanding of mosquito-parasite 
interactions into an integrated view that includes both genetic 
and epigenetic dimensions. Although a great deal of progress has 
been made in deciphering the epigenetic code of Plasmodium par- 
asites (see Hoeijmakers et al, 2012 for a review), knowledge of 
epigenetic processes in mosquitoes is limited. 

Post-translational modification (PTM) of histones 
(acetylation, methylation, phosphorylation, sumoylation, 
and ubiquitinylation) is an important regulatory mechanism 
of transcriptional control that acts by altering chromatin struc- 
ture (Kouzarides, 2007; Bonasio et al, 2010). In Drosophila, 
epigenomic profiling of enrichment or depletion patterns for 



specific histone modifications has been established genome-wide 
and this information has been correlated with the location of 
regulatory elements and functional domains (Kharchenko et al., 
2011; Negre et al, 2011). For example H3K4me3 and H3K27ac 
are associated with active promoters, H3K4mel and H3K27ac 
are present in active enhancers, and H3K27me3 associates with 
Polycomb-mediated silencing. Surprisingly, no information is 
available in mosquitoes on histone PTMs or regulatory elements 
involved in chromatin-based epigenetic processes, such as the 
Polycomb and trithorax complexes, insulators, and enhancers. 

Available technologies now allow the analysis of whole 
epigenomes in a straightforward and cost-effective manner. 
Chromatin immunoprecipitation followed by sequencing (ChlP- 
seq) allows the mapping of protein-DNA interactions genome- 
wide. In this method DNA-protein complexes containing a 
specific protein of interest are immunoprecipitated from cross- 
linked, sonicated chromatin. DNA is purified from the enriched 
pool and used to generate a library for subsequent whole genome 
sequencing. The sequence reads for the enriched DNA are com- 
putationally aligned to the reference genome to define sharp peaks 
or broad blocks of modified histones. Since its development in 
2007 (Robertson et al., 2007), ChlP-seq has been extensively used 
to survey the genomic profiles of histones and their modifica- 
tions, transcription factors (TPs), DNA and histone modify^ing 
enzymes, the transcriptional machinery, and other chromatin 
associated proteins in various model organisms, including the 
fruit fly Drosophila melanogaster (Roy et al., 2010). One downside 
of this technique is, however, that its application in non-model 
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species is still limited by the availability of reference genomes. On 
the other hand, in cases where the annotation of the genome is 
still incomplete, such as A. gambiae, the combination of histone 
modification maps and gene expression information constitutes 
a powerful approach that allows the functional interpretation of 
chromatin marks as well as de novo prediction of regulatory ele- 
ments, genes and splice variants (Ernst and Kellis, 2010; Cheng 
et al, 2011). Previous studies have analyzed the transcriptome 
of A. gambiae using microarrays (Dana et al, 2005; Marinotti 
et al, 2005; Koutsos et al, 2007; Baker et al, 2011; Maccallum 
et al., 20 1 1 ) , but this approach gives only semi-quantitative results 
due to the small dynamic range of the microarray signal. A few 
studies have applied RNA-seq in this species on whole bodies 
or chemosensory appendages (Pitts et al., 2011; Vannini et al., 
2014). To our knowledge no study has specifically targeted RNA- 
seq analyses to tissues such as the midgut or salivary glands 
that play a key role in the development of Plasmodium parasites 
within the mosquito, and are thus directly implicated in malaria 
transmission. 

Using the technical and theoretical knowledge accumulated 
from chromatin studies in Drosophila, here we carry out an analy- 
sis of the transcriptome and histone PTMs in the human malaria 
vector A. gambiae. We characterize the distribution and levels 
of H3K27ac and H3K27me3, two key histone modifications that 
have been shown to play an important role in gene regulation, by 
performing ChlP-seq analyses in the midguts of adult mosquito 
females. The midgut is key tissue for Plasmodium development 
because the obligate passage of the parasite through this tissue 
results in large losses in parasite numbers, which may explain 
the frequent failure of the parasite to complete its life cycle in 
the mosquito (Blandin et al., 2008). We then correlate the his- 
tone profiles with levels of genome-wide gene expression obtained 
by RNA-seq, in order to infer functional states and predict puta- 
tive regulatory elements in the mosquito. This integrative analysis 
allowed us to link enrichment or depletion of active and repressive 
histone modifications to their target genes. The result is the first 
platform for mosquito comparative epigenomics that can serve 
as a basis for future studies on the biology of mosquitoes and 
mosquito-borne diseases. 

EXPERIMENTAL PROCEDURES 
MOSQUITO REARING AND DISSECTION 

Experiments were performed using an isogenic strain of A. gam- 
biae (strain Kisumu) maintained at the MIVEGEC insectarium 
under standard rearing conditions (27 ± 1°C, 70 ± 10% RH and 
16L: 8D photoperiod). Blood- fed adult females were allowed to 
lay eggs. On the day of hatching, larvae were seeded into plas- 
tic trays (25 x 35 x 7 cm) containing one liter of mineral water 
at a density of 300 individuals per tray. Larvae were provided 
with 200 mg of TetraMin® fish flakes the day after hatching, 
and, from then on, 400 mg TetraMin every 2 days until pupa- 
tion. On pupation, trays were placed inside an emergence cage 
(27 X 40 X 35 cm) and provided with an ad libitum source of 
10% sugar solution for the emerged adults. Dissection of the 
midguts and the salivary glands was performed on adult females 
aged 6-8 days. Tissues were maintained in ice-cold Schneider's 
insect culture medium (Sigma-Aldrich) to avoid degradation, and 



fresh tissues were immediately processed for chromatin and RNA 
analyses. 

RNA ISOLATION AND RNA-SEQ LIBRARY PREPARATION 

Total RNA was extracted from fresh mosquito tissues (~25 
midguts and ~ 50 salivary glands) using the mirVana™ RNA 
Isolation Kit (Ambion®) according to the manufacturer's pro- 
tocol. RNA concentration was quantified using a Qubit® 2.0 
Fluorometer, and RNA integrity was determined with an Agilent 
2100 Bioanalyzer. Illumina libraries were prepared and sequenced 
at the HudsonAlpha Institute for Biotechnology, using an 
Illumina HiSeq2000 sequencer. Standard directional RNA-seq 
library construction, 50 bp paired end reads with ribosomal 
reduction (RiboMinus'^" Eukaryote Kit, Ambion®). 

CHROMATIN ISOLATION, IMMUNOPRECIPITATION, AND SEQUENCING 

Antibodies to histone modifications H3K27ac (Abeam ab4729) 
and H3K27me3 (Millipore 07-449) used in this study have been 
previously tested and shown to specifically recognize the appro- 
priate epitopes (Kharchenko et al., 2011; Landt et al., 2012). 
The epitope sequences are conserved between Drosophila and 
A. gambiae. 

For immunoprecipitation experiments, mosquito tissues 
(15-20 midguts) were suspended in 100 [xl of Schneider's 
medium, fixed by adding 37% formaldehyde to a final concentra- 
tion of 1%, and incubated at room temperature for lOmin. The 
reaction was stopped by adding 1/10 volumes of 1.25 M glycine 
solution, incubated for 5 min before centrifugation at 4000 rpm 
for 3 min, and washed twice using cold IX PBS. Tissues were 
lysed in 200 |il of cell lysis buffer (5mM PIPES pH 8.0, 85 mM 
KCl, 0.5% NP40), and protease inhibitor cocktail (complete pro- 
tease inhibitor cocktail tablets, Roche), then homogenized with 
a pestle mixer and incubated on ice for 15 min. After centrifu- 
gation at 4000 rpm for 8 min, the nuclei were resuspended in 
200 (il of nuclear lysis buffer (50 mM Tris-HCl pH 8.0, 10 mM 
EDTA.Na2, 1% SDS, and protease inhibitors), and incubated on 
ice for 20 min. Then, 100 |il of IP dilution buffer (0.01% SDS, 
1.1% Trition X-100, 1.2 mM EDTA.Na2, 16.7 mM Tris-HCl pH 
8.0, 167 mM NaCl) were added. Sonication was performed using 
a Diagenode BioRuptor using 40 cycles of 30 s ON/ 30 s OFF, 
resulting in fragments around 300-500 bp, followed by centrifu- 
gation at 4°C for 10 min. Quality control of fixed chromatin was 
performed by visualization of seared fragments by agarose elec- 
trophoresis prior to proceeding with the chromatin IP. For the 
preclear step, 50 |xl of protein A magnetic beads (Dynabeads® 
Protein A, Novex®) were pre-coated using anti-rabbit IgG in 5% 
BSA. Beads were then added to each sample diluted 5X in IP dilu- 
tion buffer, and incubated overnight (OVN) at 4°C in a rocker. 
Seventy five |xl (5%) were taken as a control input from this 
solution and subjected to the same experimental procedure than 
the test sample, except for the antibody-binding step. For the 
test sample, 3 (il of Ab (1 |J.g/|il) pre-bound to 50 |il of protein 
A beads were added to the pre-cleared chromatin and incubated 
OVN at 4°C. After discarding the supernatant by magnetic sepa- 
ration, antibody-bound chromatin was subjected to 3 washes in 
low salt buffer (0.1% SDS, 1% Trition X-100, 2mM EDTA.Na2 
pH 8.0, 20 mM Tris-HCl pH 8.0, 150 mM NaCl), 2 washes in high 
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salt buffer (0.1% SDS, 1% Trition X-100, 2mM EDTA.Na2 pH 
8.0, 20 mM Tris-HCl pH 8.0, 500 mM NaCl), 2 washes in LiCl 
buffer (1 mM EDTA.Na2 pH 8.0, 10 mM Tris-HCl pH 8.0, 0.25 M 
LiCl, 1% NP40, 1% SDS), and one final wash in TE. Elutions 
were then performed by adding 2 x 200 [il of IP Elution Buffer 
(0.1 M NaHCOs, 1% SDS). Crosslinks were reversed by incu- 
bating the ligated chromatin at 65° C OVN with the addition of 
0.25 M NaCl, 10 mM EDTA, and 40 mM Tris. Extraction was per- 
formed by adding 8 (i-l of proteinase K ( 10 mg/ml) and incubation 
at 50° C for 2h. The DNA was extracted following a standard 
phenol-chloroform procedure. 

Sequencing libraries were prepared following the protocol 
described by Bowman et al. (2013) optimized for low quantity 
DNA. Before library preparation, ChIP DNA was pre-cleaned 
using a 1.8 X ratio of SPRI beads (Agencourt AMPure XP, 
Beckman Coulter). End preparation was performed by end repair 
(End-It DNA End Repair Kit, Epicenter Cat* ER0720) and addi- 
tion of "A" base to 3'ends (Klenow 3'-5'exo, NEB Cat* M0212S). 
Adaptor ligation was performed using a universal adaptor 
sequence and T4 ligase (NEB Cat* M0202S), and the ligated prod- 
ucts were SPRI cleaned (1.6 X beads ratio). Amplification of Chip 
libraries was performed by qPCR using an Applied Biosystems 
7500 Fast Real-Time PCR System. Reactions (50 (il) consisted of 
0.2 (iM universal primer, 0.2 |xM barcoded primer and 25 |xl of 
KARA SYBR FAST qPCR Master Mix (KapaBiosystems). Real time 
kinetics was monitored and library amplifications were stopped 
at the end of the extension after SYBR green reported reaction 
kinetics in the log phase (usually 15 cycles). A final SPRI purifi- 
cation with 1.0 X beads ratio was performed on the ChIP DNA 
libraries to remove adapter/primer dimers (120 bp) from the rest 
of the library (250 bp and above). ChIP libraries were sequenced 
at the HudsonAlpha Institute for Biotechnology using an lUumina 
HiSeq2000 sequencer. 

DATA ANALYSES 
RNA-seq data analysis 

RNA raw sequencing data were quality checked using 
FastQC, and trimmed on both ends, based on the quality 
estimates for each sequence, by using the FASTX-Toolkit 
(http://hannonlab.cshl.edu/fastx_toolkit). For the analysis of 
RNA paired directional reads we used the Tuxedo suite of 
programs as described by Trapnell et al. (2012). First, TopHat 
v2.0.10 (Trapnell et al, 2009) was used to map reads to the GTE 
annotation file of the most recent version of the A. gambiae 
genome, assembly AgamP4 PEST strain, available at VectorBase 
(https://www.vectorbase.org/). Reads were aligned using default 
parameters, except the option of library type set as first-strand 
for directional RNA-seq, and the inner mate distance param- 
eter r, which was estimated using the CoUectlnsertSizeMetrics 
tool implemented in the Picard's tools suite (http://picard. 
sourceforge.net/index.shtml). The Cufflinks v2.1.1 package 
(Roberts et al, 2011), which includes Cuffcompare, Cuffmerge 
and Cuffdiff; was used to quantify transcript abundance in 
terms of Fragment Per Kilobase Million (FPKM), allowing the 
discovery of new transcripts. Parameters were set as default using 
the GTE option, bias correction, and applying the total hits 
norm. We then compared the new transcriptomes to existing 



annotated transcripts using Cuffcompare. We ran Cuffdiff first 
on the conserved, known transcripts, and then on the com- 
bined transcriptome, which includes both previously annotated 
and new transcripts, to examine tissue specific differences in 
gene expression between midguts and salivary glands. The 
clusters of differently expressed genes were visualized using the 
R/Bioconductor cummeRbund package (http://compbio.mit. 
edu/cummeRbund/). Significant expression differences between 
tissues was defined as having both a -P-value < 0.01 and a false 
discovery rate (FDR) of <0.05. 

Chip-seq data analyses 

Quality metrics statistics of lUumina reads were obtained using 
FastQC (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc). 
We used Bowtie v.1.0.0 (Langmead and Salzberg, 2012) to map 
reads from control and ChIP libraries to a custom genome index 
for A. gambiae; buUt using the most recent AgamP4 reference 
assembly using the howtie-build tool. We ran Bowtie using the — 
m option to filter repetitive sequences. To identify significandy 
enriched regions for H3K27ac we performed peak calling anal- 
ysis using MACS vl.4.2 (Zhang et al, 2008; Feng et al, 2012). 
Parameters were set to consider equal numbers of unique reads 
for input and ChIP samples, remove redundant reads (PCR 
duplicates), and a p-value cutoff of 1 x lOe-5. To analyze the dis- 
tribution of H3K27me3 we used SICER, which is optimized for 
the analysis of broad ChlP-enrichment regions (Zang et al, 2009; 
Xu et al., 2014) characteristic for this histone modification mark. 
Parameters were set to a window size of 200, gap size 600 and FDR 
cutoff of 1 X lOe-5, and the default value for the redundant rate 
cutoff 

We used BEDTools2.19.0 (Quinlan and Hall, 2010) to obtain 
genome coverage for each histone modification mark, which is the 
percentage of the genome in bp that shows significant enrichment 
of H3K27ac and H3K27me3, as well as the intersects of H3K27ac 
and H3K27me3 enrichment peaks with various genomic features 
(promoters, TSSs and genes). For these analyses we define pro- 
moters as regions located within a distance of 200 bp upstream 
and downstream of the transcription start site (TSS). 

To examine the association between H3K27ac and H3K27me3 
profiles genome-wide, we first normalized control and ChIP sam- 
ples based on read counts and calculated the difference using 
DeepTools (Ramirez et al., in press). The resulting enrichment 
files were binned into 10 bp windows covering the entire A. gam- 
biae genome reference sequence. We then referenced the 10 bin 
normalized and subtracted ChIP enrichments to gene coordinates 
of the AgamP4 gene set and calculated average enrichments per 
gene across a region that includes the gene body scaled to 500 bp, 
flanked by 200 bp before the TSS and after the Transcription 
Termination Site (TTS), respectively. The same type of analy- 
sis was performed by referencing ChIP data to gene coordinates 
from the "custom" gene set generated by CuffCompare, which 
merges novel and conserved transcripts (see above). We then 
applied K-means clustering to the matrix of enrichment val- 
ues for H3K27me3 to organize genes based on their similarities 
into 3 clusters using Cluster v3.0. Genes within each cluster 
were ordered by decreasing mean enrichment values. Groups of 
H3K27me3 enrichment were then used as anchors to order the 
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matrix of mean values of H3K27ac and gene expression. Clustered 
data was then visualized as a heatmap using deepTools. 

Combined analyses of ChIP and RNA-seq data were 
performed in R 3.1.0 (littp://www.r-project.org/) using 
Bioconductor (http://www.bioconductor.org). We used BedTools 
v2.19.0 and SamTools vl.4 (http://samtools.sourceforge.net) for 
file manipulation and conversion. Coverage analyses on bed and 
bam files were performed using BedTools and DeepTools suites. 

Functional analysis of genes 

Functional analyses were performed on genes obtained from 
the clustering approach corresponding to the various enrich- 
ment profiles (see results), and that intersect with H3K27ac 
and H3K27me3 peaks identified by MACS or SICER. Sequences 
for the various gene sets were retrieved by mapping VectorBase 
gene IDs against the UniProtKB database (http://beta.uniprot. 
org/). We employed Blast2Go (http://www.blast2go.de/) to con- 
duct Gene Ontology (GO) and KEGG (Kyoto Encyclopedia of 
Genes and Genomes) pathways annotations. Fisher's exact test 
was used to retrieve significantly enriched GO terms for genes 
marked with H3K27ac and H3K27me3 using Blast2Go. GO func- 
tional categories are defined as those containing at least five genes 
and a minimum enrichment score of 1.3 (p-value < 0.05). 

REAL-TIME PGR VALIDATION 

Validation by ChlP-qPCR was performed by targeting two house- 
keeping genes, actinSC and TER94, which typically show sig- 
nificant enrichment of H3K27ac in Drosophila (http://www. 
modencode.org/). For this assay we also targeted an intergenic 
region located 2 Kb upstream of these two genes as a negative con- 
trol. Reactions were performed using SYBR Green I Master kit 
(Applied Biosystems) in an Applied Biosystems Lightcycler. The 
PGR parameters are: 1 cycle of 10 min at 95°C, 40 cycles of 10 s at 
95°C, 10 s at 60° C, and 20 s at 72°C. PGR primer sequences are 
listed in Table SI. 

RESULTS 

GENE EXPRESSION AND TRANSCRIPT DISCOVERY 

RNA library processing and quality metrics are shown in Table 1. 
The expression analysis on the assembly of reads obtained from 
A. gambiae female mosquito tissues, midguts and salivary glands, 
resulted in a total of 13,969 novel transcripts that correspond to 
newly discovered non-annotated loci (class code "u": Unknown, 
intergenic transcript); combined GTF and BED files available 
at GEO GSE59773) as well as transcripts categorized as class 



code "="correspond in most cases to extensions of previously 
annotated genes based on intron matches. Most novel transcripts 
(6907 in midguts and 6943 in salivary glands) show relative abun- 
dances, expressed as Fragments Per Kilobase of exon model per 
Million mapped fragments (FPKM), above the 95 percentile of 
expression for the distribution of transcript abundances, which 
we estimated correspond to a value of 0.26 for midguts and 
0.56 for salivary glands (lower bound of the 95% confidence 
interval (CI) for the distribution of transcript abundance). Thus 
transcripts that show FPKM values above this threshold can be 
considered as reliable in terms of novel transcript discovery and 
gene expression differences. Blasting a subset of these transcripts 
against non-redundant protein databases confirms that they are 
novel (data not shown). This supports the notion that there is a 
large number of genes in the reference genome of A. gambiae that 
remain to be discovered and/or properly annotated. 

The comparison between transcripts obtained from midguts 
and salivary glands identified a total of 23 differentially expressed 
genes, 23 TSS and 21 isoforms switching events (p-value < 
0.01, (j-value < 0.05) (Figure 1). We did not detect signifi- 
cant differences in expression when considering the combined 
transcriptome, which includes both previously annotated and 
novel genes (Figure 1), probably due to low abundances of 
novel transcripts, which represent half of the combined data 
set, and which may have bias quantification results. Note that 
these are conservative estimates based on the threshold used by 
CufQinks. 

VALIDATION OF THE ChlP-SEQ APPROACH AND MAPPING OF H3K27ac 
AND H3K27me3 IN A. GAMBIAE 

Statistics of data processing are presented in Table 1. ChIP and 
control libraries passed all quality tests (Figure 81). The per- 
centage of reads mapped to the mosquito genome was 53.70% 
for H3K27ac (13,300,823 reads) and 58.19% for H3K27me3 
(12,832,729 reads). Of these, 13.73% for H3K27ac and 17.51% 
for H3K27me3 reads were identified by Bowtie as corresponding 
to repetitive sequences such as those typically found in trans- 
posons and centromeric or telomeric regions. The percentage 
of mapping is similar between the two target histones and the 
control. Results of the ChlP-qPCR validation are also consistent 
with data obtained by the ChlP-seq analysis (data not shown). 
Taken together, these results indicate that the ChlP-seq data 
provide an accurate representation of the genome-wide distribu- 
tion of histone modifications analyzed in A. gambiae and offer 
proof-of-concept validation of the approach. 



Table 1 | Statistics of ChlP-seq and RNA-seq data processing and quality checks. 



No. reads Reads mapped Reads excluded Reads analyzed %GC Quality score 



ChlP-seq ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^B 

H3K27ac 24770844 13300823(53.7%) 3401339(13.7%) 12286069(49.6%) 45 38 

H3K27me3 22053926 12832729(58.2%) 3861626(175%) 10249493(46.5%) 49 38 

Input 19456294 9684252(49.8%) 3235677(16.6%) 9068488(50.7%) 42 38 

Midgut 83046282 60149597 (72.4%) 52 37 

Salivary gland 78884200 57346015(72.7%) 52 37 
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FIGURE 1 I Differential expression analyses between midgut and 
salivary gland samples of A. gambiae. (A) Volcano plots displaying 
changes in gene expression, isoforms, and TSS usage (log2-fold change). 
Each symbol represents one gene that has detectable expression in 
either tissue. Relative differences in signal intensity along the X-axis 
reflect up-regulation in salivary glands when positive and up-regulation in 



midguts when negative. The Y-axis displays loglO-transformed p-values 
associated with tests of differential gene expression; red symbols 
indicate significant values for a threshold level of 0.05. (B) Density plots 
showing the expression level distribution for all genes, isoforms, and 
TSS, in the two tissues. FPKM, fragments per kb of transcript per million 
fragments mapped. 



Table 2 | Coverage of H3K27ac and H3K27me in the A. gambiae genome by chromosome. 



Chromosome 



H3K27ac 



H3K27me3 



Size (bp) 





Read depth 


% cov 


Read depth 


% cov 




2L 


4051943 


8.21 


10503751 


21.28 


49364325 


2R 


5094054 


8.28 


15188866 


24.68 


61545105 


3L 


2984278 


711 


8646592 


20.61 


41963435 


3R 


3645201 


6.85 


11609840 


21.82 


53200684 


UNKN 


111428 


0.26 


249640 


0.59 


42389979 


X 


1661196 


6.81 


6941572 


28.45 


24393108 


Genome 


17548100 


6.43 


53140261 


19.45 


273109044 



The depth of coverage in bp, the fraction of the chromosome covered, and the total size in bp for each chromosome and the complete genome is indicated. 



Coverage analyses of ChIP peaks regions reveal that 6.4% 
of the genome is occupied by H3K27ac, whereas 22% is occu- 
pied by H3K27me3. The fraction of the genome covered by each 
histone modification across different chromosomes is shown in 
Table 2. By applying a peak calling approach using either MACS 
or SICER we identified 6639 peaks for H3K27ac and 12,939 
peaks for H3K27me3 after input control normalization. Of the 
peaks identified for H3K27ac, 70.2% (4657) intersect with anno- 
tated genes or their promoters (here defined as regions located 
200 bp upstream and downstream of TSSs), whereas a small 
fraction of the remaining peaks (29.8% of total) correspond 
to intergenic regions. In the case of H3K27me3, only 41% of 



the peaks (5414) intersect genes or their promoters and, there- 
fore, most peaks appear to be intergenic. When considering 
the custom gene set that includes the newly discovered tran- 
scripts, the percentage of peaks that overlap to genes is 6156 
for H3K27ac and 7813 for H3K27me3, thus indicating that a 
considerable proportion of ChIP signal maps to non-annotated 
regions. 

GENOME-WIDE PROFILING OF HISTONE AND GENE 
TRANSCRIPTIONAL STATES 

To examine the patterns of histone acetylation and methyla- 
tion genome-wide, we considered A. gambiae genes plus flanking 
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regions of 200 bp upstream and downstream of the TSS and 
TTS, respectively. The distribution maps of the two histone 
modifications at genes revealed a mutually exclusive pattern of 
H3K27ac and H3K27me3 enrichment genome-wide (Figure 2). 
The same pattern was obtained when using the standard anno- 
tated AgamP4 or our custom gene-set as anchors. When exam- 
ining the average histone modification profiles across the gene 
regions, results show peak maxima of H3K27ac downstream 
of the TSSs of genes (Figure 2B), whereas H3K27me3 tends 
to occupy broader regions that cover the entire body of genes 
(Figure 2A). K-means clustering of ChlP-seq data resulted in 
the classification of genomic regions into three clusters based on 
their H3K27me3 mean enrichment profiles. Cluster 1 genes, (CI: 
12,913 genes) are depleted of H3K27me3 and contain high lev- 
els of H3K27ac. This cluster may correspond to genes that are 
highly transcribed. Cluster 3 genes (C3: 2021 genes) contain high 
levels of H3K27me3 and they are depleted of H3K27ac. Genes 
in this cluster probably correspond to Polycomb-silenced genes 
in Drosophila. Interestingly, a third gene cluster identified based 
on H3K27me3 levels, cluster 2 (C2: 11,976 genes) contains inter- 
mediate levels of H3K27me3 and are also generally depleted of 
H3K27ac but not as dramatically as genes in C3 (Figure 2D). 
Pairwise comparisons between clusters revealed significant differ- 
ences in mean enrichment values for genes marked with H3K27ac 
(CI = 0.320 ±0.013, C2= -0.100 ±0.004, C3 = -0.198 ± 
0.007; Kruskal-Wallis, all p-values < 0.001), or H3K27me3 
(CI = -1.281 ±0.005, C2 = -0.017 ±0.004, C3 = 2.525 ± 
0.033; Kruskal-Wallis, all p-values < 0.001; Mean values of 
Chip enrichment by gene, normalized and input subtracted). 
It is also noticeable from the heatmaps that at regions where 
H3K27me3 localize at high levels, the signal often expands 
toward the 3 and 5' end of the genes covering most of the 
flanking region. Likewise, a closer look at genes marked with 
H3K27ac reveals two sets of genes based on the distribution 
across the region: genes that have this histone modification at 
the promoter and coding region and a second group of inter- 
mediate enrichment where this mark is centered on promoters 
(Figure 2B). 

We next asked how these various histone modification pro- 
files relate to the transcriptional status of genes. Considering 
genes, annotated and novel, that intersect with H3K27ac and 
H3K27me peaks, we found significant differences in gene expres- 
sion between genes containing each of these two histone 
modification (Kruskal-Wallis chi-squared = 7138.69, df = 1, 
p-value = 0). Comparison of histone modification maps and 
gene expression genome-wide reveals a preferential localiza- 
tion of H3K27ac in active genes and promoters whereas genes 
that are marked with H3K27me3 are silent or are transcribed 
at low levels (Figure 2C). Indeed, this gene expression pat- 
tern is also true in quantitative terms when considering the 
gene clusters defined based on the levels of enrichment or 
depletion of histone PTMs (CI: 1.60 ± 0.029, C2: -1.22 ± 
0.03, C3: -2.18 ±0.05; Kruskal-Wallis chi-squared C1-C2 
= 4646.224, df = 1; C2-C3 = 238.38, df = 1; C1-C3 = 
2097.31, df = I, all p-values < 0.001; Mean gene expression 
by gene are calculated based on the log2 of FPKM values) 
(Figure 2E). 



FUNCTIONAL ANNOTATION OF HISTONE MODIFICATION PROFILES 

Genes clustered based on their histone modification profile, either 
marked with H3K27ac or H3K27me3, were assigned to GO acces- 
sion numbers and classified into functional categories under three 
major classes (biological process, cellular component, and molec- 
ular function). We found multiple GO categories associated with 
each cluster (Table S2). Genes enriched in either H3K27ac or 
H3K27me3, were assigned to multiple different KEGG pathways. 
Of these, pathways involved in metabolism of amino acids and 
signaling pathways were top ranked hits for genes marked with 
H3K27ac or H3K27me3 (Figure S2). The genes associated to each 
histone PTM (see Figure 2 for cluster names) also differ in the rel- 
ative enrichment of GO terms (Figure 3, p < 0.05). Particularly, 
genes in cluster 1, highly enriched in H3K27ac, and cluster 2, 
containing intermediate levels of H3K27ac and H3K27me3, dis- 
play a significant differential enrichment in GO terms associated 
with metabolic processes, immune, and signaling pathways. This 
is in marked contrast with genes in cluster 3, which display 
differential enrichment in GO terms related to membrane trans- 
porters and developmental processes (Figure 3, Table S2). One 
classic example of developmental genes that fall into this cate- 
gory and are associated with Polycomb and H3K27me3 domains 
in Drosophila are the homeobox genes (Schwartz et al, 2006) 
(Figure 4A, GO:0005634 Table S2). We then asked whether there 
is an association between the GO terms retrieved for genes 
differentially marked with H3K27ac and H3K27me3, and GO 
terms that have been linked to malaria infection and mosquito 
resistance to insecticides (Table S3). Interestingly, genes marked 
with these two histone modification marks include immunity 
related genes such as Toll, Immune Deficiency (IMD) and 
Janus kinase/signal transducers and activators of transcription 
(JAK/STAT) pathways; antimicrobial effectors such as defensins 
and cepropins; and metabolic genes such as the Vitellogenin gene. 
Examples of histone modification profiles of two of these can- 
didate genes, which are significantly associated with enriched 
GO terms, are shown in Figure 4B, and the histone modifica- 
tion profiles for each candidate gene examined are shown in 
Table S3. 

DISCUSSION 

Changes in chromatin state play an important role in a num- 
ber of biological processes, including responses to stress and 
environmental conditions. In organisms like Drosophila that lack 
DNA methylation, post-translational modifications of histones 
are an important layer of gene regulation (Bonasio et al., 2010; 
Bannister and Kouzarides, 2011). However, chromatin maps and 
the dynamics of epigenetic control in mosquitoes have remained 
to date unexplored. In order to fiU this gap we describe genome- 
wide maps of two histone modifications with key roles in gene 
regulation, H3K27ac and H3K27me3. To establish a link with 
gene expression, we then correlate the presence of these modifica- 
tions in specific genes with their transcriptional state in midguts 
and salivary glands of A. gamhiae females. 

In Drosophila, various combinations of active and repressive 
histone marks have been found to be associated with distinct 
chromatin functional states (Filion et al, 2010; Roy et al., 2010; 
Kharchenko et al, 2011; Negre et al, 2011; Yin et al, 2011). Two 
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FIGURE 2 I Genome-wide distribution of liistone modifications. 

Distribution of (A) H3K27me3 and (B) H3K27ac with respect to gene 
features in A. gambiae midguts. The enrichment or depletion is shown 
relative to chromatin input. The regions in the map comprise gene 
bodies flanked by a segment of 200 bp at the 5' end of TSSs and 
TTSs. Average profiles across gene regions ±200 bp for each histone 
modification are shown on top. (C) Heatmap of RNA-seq data showing 
the level gene expression, as read count, profiled along the region. In 



all heatmaps (A-C) genes were organized into 3 clusters based on 
their level of H3K27me3. For H3K27ac, cluster genes are ordered by 
descent but independent to H3K27me3. The color bars indicate the 
range of intensities based on ChIP enrichment, from red to blues for 
higher to low enrichment values. Boxplots showing the mean 
enrichment of H3K27ac and H3K27me3 by cluster (D), and mean level 
of gene expression by gene cluster (E). Significant pairwise 
comparisons are indicated by asterisks (see text). 
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FIGURE 3 1 Functional analysis of A. gambiae genes. The graphs 
show GO terms significantly associated with genes that show 
significant enrichment or depletion of H3K27ac and H3K27me3 at 
high levels (see Figure 2). Bars corresponds to number of 



sequences associated with each GO term. In the case of genes 
marked with H3K27ac from cluster 1 due to the number of 
records only most significant GO terms are shown (see Table S2 
for the complete list). 



of these histone modifications that display contrasting distribu- 
tion patterns and regulatory fijnctions include acetylation and 
methylation of lysine 27 of histone 3. Previous studies have shown 
that H3K27ac is typically enriched at enhancers, TSSs and coding 



regions of active genes, whereas H3K27me3 localizes to repressed 
loci (Tolhuis et al, 2006; Kharchenko et al, 2011; Kellner et al, 
2012; Van Bortle et al., 2012; Brown and Bachtrog, in press). 
Results described here for the mosquito A. gambiae agree with 



Frontiers in Genetics | Epigenomics and Epigenetics 



August 2014 | Volume 5 | Article 277 | 8 



Gomez-Diaz et al. 



The epigenome of the malaria vector 



[0-120] 

H3K27ac 



[0-1001 



Chromosome 2R 



^_ .L iiki,kyiii.uiij^j .i...L^yiiiii.iial..i.. ikyuiiiu^^iiiii 



AGAP0O4659-RA 



AGAP004660-HB 



Antennapedia homeotic protein 
(AGAP004659-AGAP004660) „s.P...6e.-R. 









;o-i20| 

H3K27ac 


-J 


Chromosome 2L 


:0-iDO| 






H3K27me3 


^ . .. .A .. - _ 




[0-50] 






RNAseq reads coverage 


.....J* iLiii.... 






21 


( ■ , , Gene 




AGAPO06747-RA 










n ■ ■ 
n ■ ■ 


inn ■ ■ mm ^^^^ (AGAP006747} 


F^K^H^H K^KI H ■lini 

■ ^^^1 I imi 

AGAP0D6749-RA 
1 > ■ 

AGAP0O67ae-RA 



[0-120] 

H3K27ac 



Chromosome 2L 



[0-100] 
H3K27me3 



RNAseq reads coverage 



ill ii 



Dscam (AGAP007092) 



FIGURE 4 I Histone modification levels in selected genes. (A) Enrichment 
profile of histone PTIVIs in two Hox genes of A. gambiae (Antennapedia 
homeotic proteins, AGAP004659-AGAP004660). These are classic examples 
of Drosophila developmental genes where H3K27me3 is highly enriched and 
distributed in large domains that encompass both genie and intergenic 
regions in cells where the Antennapedia gene is silenced. Examples of 



various histone modification profiles, based on the level of enrichment or 
depletion in H3K27ac or H3K27me3, in two candidate genes: (B) Rel2 
{AGAP006747 GO:0005515), and (C) Dscam (AGAP007092, GO: 
GO:0005515) (Tables S2, S3). The scale of the tracks is proportional to the 
number of sequencing reads for each histone modification. Gene expression 
in terms number of RNA reads mapped to each gene is also displayed. 



these observations and suggest a close association between the 
presence of H3K27ac and H3K27me3, and the transcriptional 
state of genes. 

Genome-wide histone maps in A. gambiae reveal that 
H3K27ac and H3K27me3 cover approximately 6 and 14% of 
the genome, respectively. These two genome fractions do not 



overlap and there is a mutually exclusive pattern of enrichment 
at most genes: H3K27me3-containing promoters and genes show 
no enrichment for H3K27ac, and, reciprocally, those contain- 
ing H3K27ac are not enriched for H3K27me3 (Figure 2). This 
is in agreement with previous studies in Drosophila, and sup- 
ports the notion that these two histone modifications also have 
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opposite biological functions in mosquitoes. Despite a general 
mutual exclusive pattern in the distribution of methylated and 
acetylated H3K27 sites, we found some regions of overlap, includ- 
ing both genie and intergenic regions. A possible explanation for 
this pattern would be promoter bivalency, which occurs when 
gene promoters contain both active and repressive modifications. 
These promoters are considered poised for activation (or repres- 
sion) at a later stage. But this pattern has been described in the 
context of differentiating cells or during development (Creyghton 
et al., 2010), and is restricted in principle to promoters con- 
taining activating histone H3K4me3 and repressive H3K27me3 
marks, not H3K27ac (Voigt et al., 2013). Alternatively, the pres- 
ence of the two histone modifications can be the result of 
heterogeneous enrichments across cells as it has been recently 
been shown using single-cell ChlP-seq technologies (Wang and 
Bodovitz, 2010). In the case of intergenic regions, co-localization 
of H3K27ac and H3K27me3 has been associated with the pres- 
ence of enhancer elements (Creyghton et al, 2010). To validate 
this hypothesis in mosquitoes it would be necessary to map other 
histone modifications such as H3K4mel that are diagnostic for 
this type of regulatory elements (Visel et al, 2009; Zhu et al., 
2013). 

The combined analysis of ChlP-seq and RNA-seq data in 
midgut cells of A. gambiae supports a relationship between tran- 
scription levels and profiles of histone PTMs. We find that the 
presence of H3K27ac coincides with actively transcribed genes, 
whereas H3K27me3 is mostly associated with clusters of repressed 
genes that show low levels of transcription. Our analyses also sug- 
gest that enrichment or depletion of H3K27ac and H3K27me3 
correlates not only with the transcriptional state of genes but 
also with the magnitude of their expression or silencing, which 
is in agreement with earlier studies in Drosophila (Negre et al., 
2011). For example, in flies and mammals it has been shown 
that H3K27me3 can form domains that can expand 100 kb or 
more and covers blocks of silent genes and intergenic regions, and 
expressed genes that are preferentially located in regions imme- 
diately flanking H3K27me3 BLOCs (Pauler et al, 2009; Young 
et al., 2011; Hou et al, 2012). Many of these domains can also 
contain Polycomb (Schwartz et al., 2006; Tolhuis et al., 2006; 
Schuettengruber et al., 2009). In this study, Polycomb-containing 
genes would correspond to those of cluster 3 in Figure 2. One 
example of this type is the Hox genes of Drosophila, which are 
active in embryonic and larval stages of the life of this organ- 
ism. These Hox genes have their homologous in mosquitoes 
and they display a pattern of H3K27me3 enrichment as broad 
peak blocks in differentiated cells of the midgut (Figure 4). It 
has been shown in Drosophila that H3K27me3 can be also be 
present in genes in euchromatic regions as focal peaks, rather 
than in domains, where there is a dynamic turnover between 
active and repressive histone marks and where these marks par- 
ticipate in the fine tuning of transcription (Riddle et al., 2011; 
Young et al., 2011). In the case of A. gambiae such a distribu- 
tion pattern would correspond to genes of cluster 2, which show 
intermediate levels of H3K27ac and H3K27me3 and medium to 
low transcription (Figure 2). Finally, genes containing H3K27ac 
show different enrichment levels in this histone PTM. Particularly 
interesting is the finding of a subset of genes that have this 



modification only at promoters and show intermediate levels of 
transcription, suggesting that RNA Polymerase II may be paused 
at these regions (Figure 2). This has been previously reported 
by ChlP-seq studies in flies and mammals that mapped RNA- 
polymerase binding patterns in association with various histone 
PTMs and regulatory elements genome-wide (Negre et al, 2011; 
Adelman and Lis, 2012). These findings together support the 
conclusion that mechanisms of regulation of chromatin struc- 
ture and function have been conserved between Anopheles and 
Drosophila. 

The functional analysis of genes differentially enriched in his- 
tone PTMs further supports the idea that histone modification 
profiles correlate with gene expression in A. gambiae. This find- 
ing, and the similarities found with the mechanisms of epigenetic 
regulation described in Drosophila, has potentially important 
implications in the context of infection and the response of 
mosquitoes to environmental stresses. We further explored this 
issue by examining the enrichment in H3K27 PTMs and the tran- 
scriptional status of a set of target genes that have been previously 
shown to be up-regulated in response to malaria infection as well 
as genes involved in insecticide resistance (referenced in Table 
S3). Interestingly, the majority of the candidate genes examined 
appear to be enriched in one of the two histone modifications, 
and most fall within clusters 1 and 2, which suggests that they 
are in a dynamic on/off switch for transcriptional regulation 
(Table S3). 

One limitation of epigenomic studies in non-model organ- 
isms is the availability of fully annotated genomes that allow 
high resolution mapping of epigenetic modifications and regu- 
latory elements to their target genes. This is clearly illustrated in 
this study, and may be one of the reasons why analysis of epi- 
genetic modifications has remained unexplored in mosquitoes. 
The analysis of the A. gambiae transcriptome in midguts and sali- 
vary glands using RNA-seq resulted in the identification of several 
thousand novel transcripts. The high discovery rate is not surpris- 
ing considering that the majority of transcriptome studies carried 
out to date in A. gambiae have employed microarrays (Baker et al, 
201 1; Maccallum et al., 201 1). It is important to note that a large 
fraction of regions enriched for H3K27ac and H3K27me3 map 
to non-annotated genes. The differential enrichment in histone 
modification marks and tissue specific signatures of transcription 
at these genes can be important in predicting potential targets 
in the response of A. gambiae to pathogens and environmental 
stressors. 

CONCLUSIONS AND FUTURE DIRECTIONS 

Here we present an integrative approach that combines genome- 
wide profiling of histone modifications and gene expression that 
allows the functional interpretation of chromatin modifications 
as well as de novo prediction of regulatory elements and genes. 
Results support a link between changes in histone modification 
profiles and transcription levels in A. gambiae, and this includes 
genes that have been previously identified as having important 
roles in the epidemiology and control of malaria transmission. 
The patterns obtained are similar to those previously described 
in Drosophila, suggesting the existence of common mechanisms 
of chromatin regulation in A. gambiae. These findings open new 
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opportunities to study the dynamics of chromatin states including 
other histone PTMs in various developmental stages and environ- 
mental conditions (i.e., infection or exposure to insecticides) as 
well as the mechanisms by which these states are controlled by 
regulatory elements such as enhancers and insulators. 
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Figure S1 | Fingerprint plot that siiows tlie distribution of reads, as tlie 
cumulative sum of read counts in 10 bp window bins, for H3K27ac, 
H3K27me3, and input samples. A tight diagonal is expected when reads 
are equally distributed across the genome, as is the case for the input, 
whereas the curve becomes more pronounced as the degree of 
enrichment increases and is more localized in the ChlPs relative to the 
control sample. 

Figure S2 | Piecharts tfiat sliow the number of sequences assigned to 
KEGG pathways for each histone modification gene cluster (see text. 
Figure 2). 
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